AITopics | transformer encoder

In recent years, models based on the Transformer architecture have seen widespread applications and have become one of the core tools in the field of deep learning. Numerous successful techniques, such as parameter-efficient fine-tuning and efficient scaling, have been proposed surrounding their applications to further enhance performance. However, the success of these strategies has always lacked the support of rigorous mathematical theory. To study the underlying mechanisms behind Transformers and related techniques, we first propose a Transformer learning framework motivated by distribution regression, with distributions being inputs, connect a two-stage sampling process with natural language processing, and present a mathematical formulation of the attention mechanism called attention operator. We demonstrate that by the attention operator, Transformers can compress distributions into function representations without loss of information. Moreover, with the advantages of our novel attention operator, Transformers exhibit a stronger capability to learn functionals with more complex structures than convolutional neural networks and fully connected networks. Finally, we obtain a generalization bound within the distribution regression framework. Through the aforementioned theoretical results, we further discuss some successful techniques emerging with large language models (LLMs), such as prompt tuning, parameter-efficient fine-tuning, and efficient scaling. We also provide theoretical insights behind these techniques within our novel analysis framework.

attention operator, large language model, machine learning, (18 more...)

arXiv.org Machine Learning

doi: 10.1162/neco_a_01726

2606.29256

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

18d10dc6e666eab6de9215ae5b3d54df-Paper.pdf

Neural Information Processing SystemsMay-1-2026, 01:51:14 GMT

affinity feature, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Supplementary of " Twins: Revisiting the Design of Spatial Attention in Vision Transformer "

Neural Information Processing SystemsApr-25-2026, 19:59:49 GMT

Swin transformer: Hierarchical vision transformer using shifted windows.

artificial intelligence, lsa gsa 1, machine learning, (13 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)

Add feedback

2fd5d41ec6cfab47e32164d5624269b1-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 08:29:49 GMT

machine learning, natural language, prediction, (16 more...)

Neural Information Processing Systems

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.94)

Add feedback

0e0157ce5ea15831072be4744cbd5334-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 13:54:38 GMT

A.1 Dataset Details & Evaluation Metrics As stated earlier, the main application of Extreme Multi-label Text Classification is in e-commerce - product recommendation and dynamic search advertisement - and in document tagging, where the objective of an algorithm is to correctly recommend/advertise among the top-k slots. Thus, for evaluation of the methods, we use precision at k (denoted by P@k), and its propensity scored variant (denoted by PSP@k) [17]. These are standard and widely used metrics by the XMC community [4]. Since P@k treats all the labels equally, it doesn't reveal the performance of the model on tail labels. However, because of the long-tailed distribution in XMC datasets, one of the main challenges is to predict tail labels correctly, which may be more valuable and informative compared to head classes.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Industry: Information Technology (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.53)
Information Technology > Artificial Intelligence > Natural Language (0.35)

Add feedback

The Power of Hard Attention Transformers on Data Sequences: A formal language theoretic perspective

Neural Information Processing SystemsMar-22-2026, 02:08:06 GMT

Formal language theory has recently been successfully employed to unravel the power of transformer encoders. This setting is primarily applicable in Natural Language Processing (NLP), as a token embedding function (where a bounded number of tokens is admitted) is first applied before feeding the input to the transformer.

artificial intelligence, natural language, proceedings, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.96)

Add feedback

64f1f27bf1b4ec22924fd0acb550c235-Paper.pdf

Neural Information Processing SystemsFeb-19-2026, 03:17:49 GMT

The proposed MLP decoder aggregates information from different layers, andthus combining both local attention and global attention to render powerful representations.

artificial intelligence, arxiv, machine learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

50ca96a1a9ebe0b5e5688a504feb6107-Paper-Conference.pdf

Neural Information Processing SystemsFeb-19-2026, 03:15:48 GMT

detection, eigen attention map, supervision, (12 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Colorado (0.04)
Asia > China > Zhejiang Province (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models Ziyi Yin 1 Muchao Y e

Neural Information Processing SystemsFeb-16-2026, 08:15:25 GMT

Vision-Language (VL) pre-trained models have shown their superiority on many multimodal tasks. However, the adversarial robustness of such models has not been fully explored. Existing approaches mainly focus on exploring the adversarial robustness under the white-box setting, which is unrealistic. In this paper, we aim to investigate a new yet practical task to craft image and text perturbations using pre-trained VL models to attack black-box fine-tuned models on different downstream tasks.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > New York > Suffolk County > Stony Brook (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Information Technology > Security & Privacy (1.00)
Government (0.84)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(2 more...)

Add feedback

Activating Self-Attention for Multi-Scene Absolute Pose Regression

Neural Information Processing SystemsFeb-12-2026, 01:21:57 GMT

Multi-scene absolute pose regression addresses the demand for fast and memory-efficient camera pose estimation across various real-world environments.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom > England > Tyne and Wear > Newcastle (0.04)

Genre: